The Anjials of Applied Statistics 
2011, Vol. 5, No. 3, 2078-2108 
DOI: 10.1214/11-AOAS459 

@ Institute of Mathematical Statistics, 2011 



SEMIPARAMETRIC MODELING OF AUTONOMOUS NONLINEAR 
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University of California, Davis 

We propose a semiparametric model for autonomous nonlinear 
dynamical systems and devise an estimation procedure for model 
fitting. This model incorporates subject-specific effects and can be 
viewed as a nonlinear semiparametric mixed effects model. We also 
propose a computationally efficient model selection procedure. We 
show by simulation studies that the proposed estimation as well as 
model selection procedures can efficiently handle sparse and noisy 
measurements. Finally, we apply the proposed method to a plant 
growth data used to study growth displacement rates within meris- 
tems of maize roots under two different experimental conditions. 

1. Introduction. Continuous time dynamical systems arise, among other 
places, in modeling certain biological processes. This includes classical exam- 
ples from population biology like the Lotka-Voltera equations for describing 
prey-predator dynamics [Perthame (2007)], or subject-specific processes like 
the progression of infectious diseases in individuals [Nowak and May (2000)]. 
Most of the existing approaches estimate the dynamical system by assum- 
ing known functional forms of the system. Moreover, many of them aim at 
estimating individual dynamics for one subject. However, in many scientific 
studies, there is a need to model the dynamical system nonpar ametrically 
due to insufficient knowledge of the problem at hand. In addition, there 
could be an interest to know the dynamics of a certain process at a pop- 
ulation level in order to answer various scientific questions. Thus, in this 
paper, we propose a new method to bridge the gap and tackle these chal- 
lenges. 
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Fig. 1. Root tip. Left panel: image of root tip with meristem* : 1 — meristem; 4 — root cap; 
5 — elongation zone. Right panel: an illustration of the root tip with the displacements of 
three markers Ml, M2, AI3 indicated at times t(i,ti,t2,t-j,. ("From wikipedia.) 

To motivate the model, we first briefly discuss a study on plant growth. 
There is a lot of research aiming to understand the effect of environmental 
conditions on the growth in a plant. For example, root growth in plants is 
highly sensitive to environmental factors such as temperature, water deficit 
or nutrients [Schurr, Walter and Rascher (2006); Walter et al. (2002)]. In 
Sacks, Silk and Burman (1997), an experiment is conducted to study the ef- 
fect of water stress on cortical cell division rates through growth displacement 
rate within the meristem of the primary root of maize seedlings (Figure 1: 
left panel). In this study, for each plant, measurements are taken on the 
displacement, measured as the distance in millimeters from the root cap 
junction (root apex) of a number of markers on the root over a period of 
12 hours (Figure 1: right panel). The growth displacement rate is defined 
as the rate of displacement of a particle placed along the root and, thus, it 
is a function of distance from the root apex. By its definition, growth dis- 
placement rate characterizes the relationship between the growth trajectory 
and its derivative (with respect to time). Therefore, it is the gradient func- 
tion in the corresponding dynamical system. In this study there is a need 
to understand the dynamics at the population level, while accounting for 
subject-specific variations, in order to compare the growth displacement 
rates under two different water conditions. 

Motivated by this study, in this paper, we focus on modeling and fitting 
the underlying dynamical system based on data measured over time, referred 
to as sample curves or sample paths, for a group of subjects. Moreover, for 
a given sample curve, instead of observing the whole sample path, measure- 
ments are taken only at a sparse set of time points together with possible 
measurement noise. In the plant data that we just mentioned, each plant is 
a subject, and the positions of the markers which are located at different 
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distances at time zero from the root cap junction correspond to different 
initial conditions. Each marker corresponds to one displacement trajectory 
(also referred to as growth trajectory/curve), and the number of measure- 
ments varies from two to seventeen, with measurement times varying across 
trajectories. (See Section 5 for a more detailed description.) 

We first give a brief overview of the existing literature on fitting smooth 
deterministic dynamical systems in continuous time. A large number of 
physical, chemical or biological processes are modeled through systems of 
parametric differential equations [Ljung and Glad (1994); Perthame (2007); 
Strogatz (2001)]. For example, Ramsay et al. (2007) consider modeling a con- 
tinuously stirred tank reactor and propose a method called parameter cas- 
cading for model fitting. Zhu and Wu (2007) adopt a state space approach 
for estimating the dynamics of cell-virus interactions in an AIDS clinical 
trial. Ramsay and Silverman (2002, 2005) consider fitting dynamical sys- 
tems given by systems of linear differential equations where the coefficients 
of the differential operator may be time varying. They propose methods 
for estimating these (linear) differential operators based on principal dif- 
ferential analysis when the data are recorded at dense and regular time 
points. Poyton et al. (2006) also use the principal differential analysis ap- 
proach to fit dynamical systems. Chen and Wu (2008b, 2008a) propose to 
estimate parametric differential equations with known functional forms and 
time-dependent parameters through a two-stage approach where the first 
stage involves estimation of the sample trajectories and their derivatives by 
nonparametric smoothing. Brunei (2008) gives a comprehensive theoretical 
analysis of such an approach. Cao, Fussmann and Ramsay (2008) propose 
a method for fitting nonlinear dynamical systems using splines with prede- 
termined knots for describing the gradient function. This involves knowing 
the functional form of the differential equation and does not include any 
subject-specific effects. Wu and Ding (1999) and Wu, Ding and DeGruttola 
(1998) propose using the nonlinear least squares procedure for fitting para- 
metric differential equations that take into account subject-specific effects. 

For the problems that we address in this paper, measurements are taken 
on a sparse set of points for each sample curve so that estimation of individ- 
ual sample trajectory or its derivative based on nonparametric smoothing is 
error-prone and results in a loss of information. Thus, numerical procedures 
for solving differential equations can become unstable if we treat each sample 
curve separately. Moreover, we are more interested in estimating the base- 
line dynamics at the population level than the individual dynamics of each 
subject. For example, in the plant study described above, we are interested 
in comparing the growth displacement rates under two different experimen- 
tal conditions. On the other hand, we are not so interested in the individual 
displacement rate corresponding to each plant. Another important aspect in 
modeling data with multiple subjects is that adequate measures need to be 
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taken to model possible subject-specific effects, otlierwise the estimates of 
the model parameters can have inflated variability. In this paper we propose 
a semiparametric approach for modeling dynamical systems which incor- 
porates subject-specific effects while combining information across different 
subjects. A nonparametric model is often essential because of insufficient 
knowledge about the problem to suggest a reasonable parametric form of 
the dynamical system. In addition, if realistic parametric models can be 
proposed, then the nonparametric fit can be used for diagnostics of lack of 
fit, for example, by employing a distance measure between the parametric 
and nonparametric fits and studying its sampling variability. We propose 
an estimation procedure that combines nonlinear optimization techniques 
with a numerical ODE (ordinary differential equation) solver to estimate 
the unknown parameters. In addition, we derive a computationally efficient 
approximation of the leave-one-curve-out cross-validation score for model 
selection. We show by simulation studies that the proposed approach can 
efficiently estimate the baseline dynamics with noisy and sparsely measured 
sample curves. Finally, we apply the proposed method to the plant data and 
compare the estimated growth displacement rates under the two experimen- 
tal conditions and discuss some scientific implications of the results. 

To the best of our knowledge, modeling and fitting dynamical systems 
nonparametrically while also allowing for subject-specific effects is new in the 
literature. In particular, our model differs from traditional nonlinear mixed 
effects models previously employed for fitting differential equations, which 
are almost exclusively parametric [Wu, Ding and DeGruttola (1998); Wu 
and Ding (1999); Guedj, Thiebaut and Commenges (2007); Li et al. (2002)]. 
For example, Guedj, Thiebaut and Commenges (2007) consider a nonlinear 
state-space model where the state variable follows a parametric differen- 
tial equation with subject-specific effects, and the parameters are estimated 
through a maximum likelihood approach. In contrast, for the model pro- 
posed in this paper, the form of the gradient function g is not assumed to 
be known and it is approximated in a sequence of bases with growing di- 
mension. Note that this gives rise to a sequence of parametric models with 
increasing complexity, and one needs to adopt a model selection procedure to 
select an appropriate model, as is typical in nonparametric function estima- 
tion. The theoretical derivations in Paul, Peng and Burman (2009) also show 
that the problem of estimating the gradient function g nonparametrically is 
intrinsically different from that under a parametric nonlinear mixed-effects 
model. 

The rest of the paper is organized as follows. In Section 2 we describe 
the proposed model. In Section 3 we discuss the model fitting and model 
selection procedures. In Section 4 we conduct simulation studies to illustrate 
finite sample performance of the proposed method and compare the proposed 
method with a two-stage procedure. In Section 5 we apply this method to 
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the plant data. Section 6 has a brief discussion. More details and additional 
simulation results are reported in the supplementary material [Paul, Peng 
and Burman (2011)]. 

2. Model. In this section we describe a class of autonomous dynamical 
systems that is suitable for modeling the problems discussed in Section 1. 
An autonomous dynamical system has the following general form: 

X'{t) = f{X{t)), te[To,Ti]. 

Without loss of generality, henceforth Tq = and Ti = 1. Note that the above 
equation implies that X{t) = a + f{X{u)) du, where a = X{0) is the initial 
condition. In an autonomous system, the dynamics, which is characterized 
by /, depends on time t only through the "state" X{t). This type of system 
arises in various scientific studies such as modeling prey-predator dynamics, 
virus dynamics or epidemiology [Perthame (2007)]. 

In this paper we consider the following class of autonomous dynamical 
systems: 

(2.1) Xii{t)=gi{Xa{t)), I = 1, . . . , Ni, i = 1, . . . ,n, 

where {Xii{t) : t G [0, 1], / = 1, . . . , Ni;i = 1, . . . , n} is a collection of smooth 
curves corresponding to n subjects, where A'^j > 1 is the number of curves 
associated with the ith subject. For example, in the plant study, each plant 
is a subject and each marker corresponds to one growth curve and there are 
multiple markers for each plant. We assume that all the curves associated 
with the same subject follow the same dynamics and are only differenti- 
ated by different initial conditions. These are described by the functions 
{9i{')}7=i- ^^^^ paper we model {5i(-)}r=i 

(2.2) g^{.) = e'^g{-), i = l,...,n, 
where: 

(1) the function g{-) reflects the common underlying mechanism regulat- 
ing all these dynamical systems. It is assumed to be a smooth function and 
is referred to as the gradient function. 

(2) 6i's reflect subject-specific effects in these systems. The mean of 9i's 
is assumed to be zero to impose identifiability. 

Note that one may view the trajectories for each plant as multivariate func- 
tional data. However, here for each subject, the different trajectories corre- 
spond to different initial conditions of the same ODE describing the system. 
This means that given the initial condition and the subject-specific scaling 
parameter 6i, the corresponding trajectory is completely determined by the 
underlying dynamical system and the only source of randomness is from 
measurement errors. 
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Fig. 2. Empirical derivatives X'{t) against empirical fits X{t) for the treatment group 
in the plant growth data. 

The simplicity and generality of this model make it appealing for model- 
ing a wide class of dynamical systems. First, the gradient function g{-) can 
be an arbitrary smooth function. Second, the scale parameter e^' provides 
a subject-specific tuning of the dynamics. This is motivated by the fact that, 
for a large class of problems, the variations of the dynamics in a population 
are in the scale of the rate of change rather than in the shape of the gradient 
function. For example, for the plant data, by examining the scatter plot of 
empirical derivatives versus empirical fits (Figure 2, for more details, see 
Section 5), we observe an excessive variability toward the end which reflects 
plant-specific scaling effects. Moreover, the above model is also flexible in in- 
corporating time-independent covariates, say, Zi, for example, by expressing 

T 

the scaling factor as for some parameter r]. In this paper our primary 
goal is to estimate the gradient function g nonparametrically. 

Assuming the gradient function g to be smooth means that it can be 
well-approximated by a basis representation approach: 



where (/>i,Af (•), . . . ,4'm,m{-) are linearly independent basis functions, chosen 
so that their combined support covers the range of the observed trajecto- 
ries. For example, we can use cubic splines with a suitable set of knots. 
Thus, for a given choice of the basis functions, the unknown parameters in 
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the model are the basis coefficients /3 := . . . , (^m)'^ , the scale parame- 
ters 9 := and possibly the initial conditions a := {an := Xj/(0) : / = 
1, . . . , A^jj^j^. Also, various model parameters, such as the number of ba- 
sis functions M and the knot sequence, need to be selected based on the 
data. Therefore, in essence, this is a nonlinear, semiparametric, mixed ef- 
fects model. 

3. Model fitting. 

3.1. Estimation procedure. In this section we propose an estimation pro- 
cedure based on sparsely observed noisy data. Specifically, we assume that 
the observations are given by 

(3.1) Yuj = Xii{tuj)+eiij, j = l,...,mii, 

where < tm < • • • < tn^.^ < 1 are the measurement times for the Ith curve 
of the zth subject, and {euj} are independently and identically distributed 
noise with mean zero and variance > 0. For model fitting with such data, 
we adopt an iterative updating procedure which imposes regularization on 
the estimates of 6 and a. One way to achieve this is to treat them as un- 
known random parameters from some parametric distributions. Specifically, 
we use the following set of working assumptions: (i) Oj^'s are independent and 
identically distributed as N[a,a'^) and 9iS are independent and identically 
distributed as A^(0, a^), for some a € M and > 0, (t| > 0; (ii) the noise euj 's 
are independent and identically distributed as N{0,a'^) for > 0; (iii) the 
three random vectors a, 6, e := {euj} are independent. Under these assump- 
tions, the negative joint log-likelihood of the observed data Y := {Ynj}, the 
scale parameters 9 and the initial conditions a is (up to an additive constant 
and a positive scale constant) 

n Ni mu n Ni n 

(3-2) Yl E Et^^'i - Xii{tiij;au,ei, (3)f + Ai Y^J^^au - af + A2 ^ ef, 

i=i 1=1 j=i i=i 1=1 i=i 

where Ai = a'^/a'^, A2 = a'^/a'^, and Xii{-) is the trajectory determined by an, 
6i and (5. This can be viewed as a hierarchical maximum likelihood approach 
[Lee, Nelder and Pawitan (2006)], which is considered to be a convenient 
alternative to the full (restricted) maximum likelihood approach. Define 

^iij {au,Oi, 13) := [Ynj - Xu{tiij;au,9i, (3)f + Xi{au - af/mu + \29f j ^ mu. 

1=1 

Then the loss function in (3.2) equals Yl^=i'Tld=iYl^=i^ilj{^il-i^i-iP)- Note 
that the above distributional assumptions are simply working assumptions. 
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since the expression in (3.2) can also be viewed as a regularized (.2 loss with 
penalties on the variability of and a. 

In many problems there are natural constraints on the gradient func- 
tion g. Some of these constraints can be expressed in the form of quadratic 
constraints in certain derivatives of g. Thus, to add flexibility to our estima- 
tion procedure, we allow for incorporating penalties of the form: /3"^B/3 for 
an M X M positive semi-definite matrix B in the loss function. Consequently, 
the modified objective function becomes 

n Ni mu 

(3.3) L{a,e,(3):=Y,J2Yl ^^'i («*^ Z^) + /3^B/3. 

i=i 1=1 j=i 

The proposed estimator is then the minimizer of the objective function: 

(3.4) (a,0,3):=argminL(a,0,/3). 

Note that here our main interest is the gradient function g. Thus, estimating 
the parameters of the dynamical system together with the sample trajecto- 
ries and their derivatives simultaneously is the most efficient. In contrast, in 
a two-stage approach, the trajectories and their derivatives are first obtained 
via pre-smoothing [see, e.g., Chen and Wu (2008b, 2008a); Varah (1982)], 
and then they are used in a nonparametric regression framework to derive 
an estimate of g. This is inefficient since estimation errors introduced in the 
pre-smoothing step effectively cause a loss of information. Indeed, simula- 
tion studies carried out in Section 4 and the supplementary material [Paul, 
Peng and Burman (2011)] show that two-stage estimators suffer from sig- 
nificant biases in estimating the gradient function g. Alternative ways of 
estimating g include using the reproducing kernel Hilbert space framework 
[Gu (2002)], and controlling the degree of smoothness of the fitted g by 
tuning a roughness penalty. 

In the following, we propose a numerical procedure for solving (3.4) that 
has two main ingredients: 

• Given (a, 6, f3), reconstruct the trajectories {Xn^-) -.1 = 1,..., Ni}"^^^ and 
their derivatives. This can be implemented using a numerical ODE solver, 
such as the Runge-Kutta method [Tenenbaum and Pollard (1985)]. 

• Minimize (3.3) with respect to (a, 0,/3). This amounts to a nonlinear re- 
gression problem. It can be carried out using either a specialized nonlinear 
least squares solver, like the Levenb erg-Mar quardt method, [Nocedal and 
Wright (2006)] or a general optimization procedure, such as the Newton- 
Raphson algorithm. 

The above fitting procedure bears similarity to the local, or gradient-based, 
methods discussed by Li et al. (2002), Guedj, Thiebaut and Commenges 
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(2007) and Miao et al. (2009), even though their works focus on paramet- 
ric ODEs. The main distinction of the proposed framework and those of Li 
et al. (2002) and Guedj, Thiebaut and Commenges (2007) hes in that, for 
the current setting, the complexity of the model is allowed to grow with 
increasing sample size and one eventually needs to adopt a model selection 
procedure to select an appropriate model (as is done in this paper). From 
purely a model-fitting point of view, nonlinear mixed-effects (NLME) model- 
based estimation procedures may be used in principle to fit each of these 
parametric submodels. The work of Ke and Wang (2001) on semiparametric 
mixed-effects model fitting also shares some common computational chal- 
lenges with our model. However, unlike in Ke and Wang (2001), in our case, 
the likelihood for the nonparametric component (i.e., the gradient function) 
is not available in closed form. 

We now briefly describe an optimization procedure based on the idea of 
the Levenberg-Marquardt method by linearization of {Xii(-)} with respect 
to an, 9i and /3. We break the updating step into three parts corresponding 
to the three different sets of parameters. For each set of parameters, we 
first derive a first order Taylor expansion of the curves {Xn} around their 
current values and then update them by a least squares fitting, while keeping 
the other two sets of parameters fixed at the current values. This process is 
repeated until convergence. 

For notational convenience, denote the current estimates by a* := {a*;}, 

0* := {6*} and /3*, and define the current residuals as Suj := Yiij—Xii(tiij;a*i, 
9^,(3*). For each i = 1, . . . ,n, and 1 = 1,... ,Ni, define the rrin x 1 column 
vectors 

( d ~ * * * \ ~ ~ m 

Jii,a*, ■= y-Q^Xii{tiij]a*i,9*,(3*)j , eii = {eiij)Y=i- 

For each i = 1, . . . ,n, define the mj. x 1 column vectors 

Ji,e* ■= y-QQ-^ii(.tuj; a*i,9* , l3*) J ; ?i = {eiij)f^'{ i^^, 

where mj. := "^j^i nin is the total number of measurements for the ith. sub- 
ject. For each k = l, . . . , M, define the m.. x 1 column vectors 

/ Q _ \mii,Ni,n 
\'-'Pk / j^i i^i i^i 

where m.. := XlILi S/^i total number of measurements. Note that, 

given a*, 6* and f3* , the trajectories {Xji(-)}'s and their gradients (as well 
as Hessians) can be easily evaluated on a fine grid by using numerical ODE 
solvers such as the fourth order Runge-Kutta method [see Paul, Peng and 
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Burman (2011) for details]. Since, given the trajectories, their gradients sat- 
isfy hnear differential equations, the solution may also be obtained explicitly 
(see the Appendix). The equation for updating /3, while keeping a* and 9* 
fixed, is 

[Jj V + ^3 diag(jj V) + B](/3 - /3*) = Jj? - B/3*, 

where Jg* := (Jg* : • • • : >//3*^) is an m.. x M matrix. Here A3 is a sequence of 
positive constants decreasing to zero as the number of iterations increases. 
They are used to avoid possible singularities in the system of equations. The 
normal equation for updating Oi is 

{Jle*'he* + X2){0i - e*) = J^g.Ei - A2^*, i = 1, . . . ,n. 

After updating ^j's, we re-center the current estimates such that their mean 
is set to be zero. This also helps in stabilizing the algorithm. The equation 
for updating an, while keeping 6i and /3 fixed at 6*, j3* is 

{Ju,a*Jii,a* + h){aii - a*i) = Ju,a*,^U + ^0*1, / = 1, . . . ,iVi, i = l,...,n, 

where a* := ^Li E£i a*i/N., = a* - o*, with N. := ^^^^ Ni being the 
total number of sample curves. Note that on convergence, a := a* provides 
an estimate of a. The initial estimates can be conveniently chosen. For ex- 
ample, a7 = Yai and 9'f = 0. 

This procedure is quite stable and robust to the initial parameter esti- 
mates. However, it converges slowly in the neighborhood of the minima of 
the objective function as it is a first order procedure. On the contrary, the 
Newton-Raphson algorithm has a fast convergence when starting from esti- 
mates that are already near the minima. Thus, in practice, one could first use 
the above approach (referred to as the Levenberg-Marquardt step hereafter) 
to obtain a reasonable estimate and then use the Newton-Raphson algo- 
rithm to expedite the search of the minima. The derivation of the Newton- 
Raphson algorithm is rather standard and thus is omitted. If the true gradi- 
ent function g has high complexity, and/or if either the 0j's or the noise are 
highly variable, the Newton-Raphson algorithm may be unstable, particu- 
larly when the initial conditions a = {Xj/(0)} are also estimated. Under such 
situations, we recommend using a (relatively) large number of Levenberg- 
Marquardt steps, followed by a one-step Newton-Raphson update. 

Note that the tuning parameter A3 plays a different role than the penalty 
parameters Ai and A2. The parameter A3 is used to stabilize the updates of (3 
and thereby facilitate convergence. Thus, it needs to decrease to zero with 
increasing iterations in order to avoid introducing bias in the estimate. In this 
paper, we simply set X^j = X^/j for the jth iteration, for some pre-specified 
A3 > 0. On the other hand, Ai and A2 are parts of the loss function (3.3). 
Their main role is to control the bias-variance trade-off of the estimators. 
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even though they also help in regularizing the optimization procedure. Prom 
the likelihood viewpoint, Ai and A2 are determined by the variances u^, 
and (t| through Ai = (t^/ct^ and A2 = o"^/(t|. We can estimate these variances 
from the current residuals and current values of a and 6. By assuming that 
niii > 2 for each pair 

^ n Ni mu 

i=i 1=1 j=i 



^ n Ni ^ n 



i=l 1=1 1=1 



We can then plug in the estimates a^, and to get new values of Ai 
and A2 for the next iteration. Instead, if we take the penalized loss function 
viewpoint, we can simply treat Ai and A2 as fixed regularization parameters 
which can be chosen by model selection criteria (see Section 3.3). Hence- 
forth, we refer to the method as adaptive if Ai and A2 are updated after 
each iteration, and as nonadaptive if they are kept fixed throughout the 
optimization. 

3.2. Standard error of the estimates. It is important to obtain the stan- 
dard error of the estimated gradient function. Since it is typically not pos- 
sible to obtain an estimate of the bias for a nonparametric procedure, we 
ignore the bias term and use the best projection of true g in the model 
space as the surrogate center (this is the standard practice in nonparamet- 
ric literature). Thus, equivalently, we provide an estimate of the asymptotic 
variance of /3. Based on the asymptotic analysis presented in Paul, Peng and 
Burman (2009), we derive the following estimate: 

(3.5) V{/3) := E[0 -(3)0- l^f] = ^!^n, 

with W„ = (A„ + B - C^(D„ + A2 /„) ^C„) ^, where, /„ is the nx n iden- 
tity matrix, A„ = gi3j3{e0), C„ = ge^{0,P), D„ = Qeeid,^); where 

n N-i rriii / \ f \^ 

0(3i3{O,P) := ^^X^(^-^(^i/i;ai/i6'i,/3)J i^-Q^ituj;au,9i,P)j ; 

QQp{0,j3) is the nx M matrix with the ith row being 

^^-g^{tiij;aii,9i,/3)i-^{tiij;aii,9i,P)] , i = l,...,n; 
1=1 j=i * VP / 
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and Qgg{6,l3) is the n x n diagonal matrix with the ith diagonal entry 

1=1 j=i V (^^i / 

Note that the matrices A„, C„ and D„ are obtained as byproducts of the 
estimation procedure. An estimate of the standard error of 'g{x) for x in the 
domain of {4>k,M}kLi is therefore given by 

(3.6) SE(5(x)) = [MxfV0)Mx)]'/\ 

where (f)]^{x) := {((>i^m{x), • • • , 4>m,m{x)Y' and V{(5) is as in (3.5). Note that, 
in the given asymptotic framework, we treat 0j's as random effects and the 
initial conditions {an} are assumed to be known. In deriving (3.5), we have 
ignored the correlation structure between 9i and the gradient of the objec- 
tive function with respect to di, which yields a slightly conservative (i.e., 
upwardly biased) estimate of the standard error. Obtaining the asymptotic 
standard error estimates when the initial conditions {an} are estimated from 
the data is beyond the scope of this paper. 

As an alternative way of estimating the standard error, one may also 
use bootstrap where we resample the sample trajectories corresponding to 
each subject, in order to retain the overall structure of the model. The corre- 
sponding bootstrap estimates, though simple to obtain, are computationally 
expensive and we do not pursue this in this paper. 

3.3. Model selection. After specifying a scheme for the basis func- 
tions {i?i>fc,M(')}) ^6 still need to determine various model parameters such as 
the number of basis functions M, the knot sequence, penalty parameters, etc. 
In the literature, AIC/BIC/AICc criteria have been proposed for model se- 
lection of parametric dynamical systems; see, for example, Miao et al. (2009). 
Here we propose an approximate leave-one-curve-out cross-validation score 
for model selection. Under the current context, the leave-one-curve-out CV 
score can be defined as 

n Ni mn 

(3.7) CV:=Y,Y.Y. ^ ' ^^"'^ ' 3^"'^ ) , 

1=1 1=1 j=i 

where 9^ and are estimates of 9i and /3, respectively, based on the 

data after dropping the Ith. curve of the ith subject; and a-j^ is the mini- 
mizer of Yl]'=i^iiji^iiM~'''^ ^^'^ *'^) ^i*^ respect to an; and injiaii,9i, P) := 
{Yiij — Xii{tiij;aii,9i, P))'^ is the prediction error loss. When the initial con- 
ditions an = Xii{0) are i.i.d. random variables and are known (and thus we 
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set a-i = an), the leave-one-curve-out C V score gives an asymptotically 
unbiased estimator of the prediction error. Calculating CV score (3.7) is com- 

putationally very demanding because one needs to obtain ' and (3 

for every pair of (i, I). Therefore, we propose to approximate 0\ and /3 
through a first order Taylor expansion around the estimates 0i,f5 based on 
the full data. We then obtain an approximation of a-^ by minimizing 

the corresponding criterion with the approximations of and ^ im- 
puted. Consequently, we derive an approximate CV score CV by plugging 
these approximations in (3.7), which is computationally inexpensive since 
all the quantities involved in computing CV are byproducts of the Newton- 
Raphson step used in model fitting. This approximation scheme is similar to 
the one taken in Peng and Paul (2009) under the context of functional prin- 
cipal component analysis, which itself is motivated by the work of Burman 
(1990). Detailed derivations are given in the Appendix. 

4. Simulation. In this section we conduct a simulation study to demon- 
strate the effectiveness of the proposed estimation and model selection pro- 
cedures. Since we apply our method to study the plant growth dynamics 
in Section 5, we consider a simulation setting that partly mimics that data 
set. In the simulation, the true gradient function g is represented by = 4 
cubic i?-spline basis functions with knots at (0.35,0.6,0.85,1.1) and ba- 
sis coefficients /3 = (0.1,1.2,1.6,0.4)"'". It is depicted by the solid curve in 
Figure 4. We consider two different settings for the number of measure- 
ments per curve: moderate case — mn^s are independently and identically 
distributed as Uniform [5, 20]; sparse case — m^s are independently and 
identically distributed as Uniform[3, 8]. Measurement times {tnj} are in- 
dependently and identically distributed as Uniform[0, 1]. The scale parame- 
ters OiS are randomly sampled from A^(0,(7g) with uq = 0.1; and the initial 
conditions Oj^'s are randomly sampled from a CqxI^ distribution (to ensure 
positivity as well as to study model robustness), with Ca, /cq > chosen such 
that a = 0.25,(7(1 = 0.05. Finally, the residuals ejzj's are randomly sampled 
from A^(0, a1) with ae = 0.01. Throughout the simulation, we set the number 
of subjects n = 10 and the number of curves per subject Ni = N = 20. Ob- 
servations {Yiij} are generated using the model specified by equations (2.1), 
(2.2) and (3.1). For all settings, 50 independent data sets are used to evalu- 
ate the performance of the proposed procedure. The sample trajectories are 
evaluated using the 4th order Runge-Kutta method [as described in Paul, 
Peng and Burman (2011)] on an equally spaced grid with grid spacings 
/i = 0.0005. 

In the estimation procedure, we consider cubic S-spline basis functions 
with knots at 0.1 + j/M , j = 1, . . . , M, to model g, where M varies from 2 
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Table 1 

Convergence and model selection based on 50 independent replicates 





Model 




a 


known 








a 


estimated 




2 


3 


4 


5 


6 


2 


3 


4 


5 


6 


moderate 


Number converged 


50 


50 


50 


50 


50 


50 


7 


50 


50 


46 




Number selected 








46 


1 


3 








49 


1 





sparse 


Number converged 


50 


50 


50 


50 


50 


50 


5 


49 


44 


38 




Number selected 








45 





5 


1 





47 


1 


1 



to 6. Note that here M = 4 corresponds to the true gradient function. The 
Levenberg-Marqardt step is chosen to be nonadaptive, and the Newton- 
Raphson step is chosen to be adaptive (see Section 3.1 for the definition of 
adaptive and nonadaptive). We examine three different sets of initial val- 
ues for Ai and A2: (i) Ai = cr^/cr^ = 0.04, A2 = cfI/cfI = 0.01 ("true" values); 
(ii) Ai =0.01, A2 = 0.0025 ("deflated" values); (iii) Ai = 0.16, A2 = 0.04 ("in- 
flated" values). It turns out that the estimation and model selection proce- 
dures are quite robust to the initial choice of (Ai, A2), thereby demonstrating 
the effectiveness of the adaptive method used in the Newton-Raphson step. 
Thus, in the following, we only report the results when the "true" values are 
used. 

We also compare results when (i) the initial conditions a are known, and 
hence not estimated; and (ii) when a are estimated. As can be seen from Ta- 
ble 1, the estimation procedure converges well and the true model (M = 4) 
is selected most of the times for all the cases. Mean integrated squared error 
(MISE) and Mean squared prediction error (MSPE) and the corresponding 
standard deviations, SD(ISE) and SD(SPE), based on 50 independent data 
sets, are used for measuring the estimation accuracy of g and 6, respec- 
tively. Since the true model is selected most of the times, we only report 
results under the true model in Table 2. As can be seen from this table, 
when the initial conditions a are known, there is not much difference in 



Table 2 

Estimation accuracy under the true model* 







MISE(g) 


SD(ISE) 


MSPE(0) 


SD(SPE) 


a known 


moderate 


0.069 


0.072 


0.085 


0.095 




sparse 


0.072 


0.073 


0.085 


0.095 


a estimated 


moderate 


0.088 


0.079 


0.086 


0.095 




sparse 


0.146 


0.129 


0.087 


0.094 



*A11 the numbers are multiplied by 100. 
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the performance between the moderate case and the sparse case. On the 
other hand, when a are estimated, the advantages of having more measure- 
ments become more prominent. We also conduct further simulation studies 
(results not reported in details here) to check the effect of increasing the 
noise level, as well as the dispersion of the initial conditions a. When a are 
known, even with = 0.05, the convergence is almost unaffected, and in 
about 75% of the cases the true model (M = 4) is selected. Increasing Ua to 
0.1 does affect convergence, especially for larger M . But under this setting, 
even with cr^ = 0.05, the true model converges in 90% of the cases and is 
selected to be the best in more than 75% of the cases. When a are estimated, 
the convergence deteriorates more obliviously under increased noise levels. 

In Figure 4 we have a graphical comparison of the fits when the initial 
conditions a are known versus when they are estimated in the sparse case. 
In the moderate case, there is very little visual difference under these two 
settings. We plot the true g (solid curve), the pointwise mean of g (bro- 
ken curve), and 2.5% and 97.5% pointwise quantiles (dotted curves) under 
the true model. These plots show that both fits are almost unbiased. Also, 
when a are estimated, there is greater variability in the estimated g at 
smaller values of mainly due to a scarcity of data in that region. Indeed, 
the larger MISE of the estimator of g when initial conditions are estimated 
mainly results from the larger MISE on the domain of g where there is essen- 
tially no observed data. Due to the extrapolation effect, no method without 
using true initial conditions is expected to work well on such a domain, 
especially under a nonparametric setting. This point is illustrated in more 
detail later in this section (cf. Table 3), as well as in the supplementary ma- 
terial [Paul, Peng and Burman (2011), Section S3]. Overall, as can be seen 
from these tables and figures, the proposed estimation and model selection 
procedures perform effectively. 

To evaluate the accuracy of the pointwise standard error estimator given 
in (3.6), in Figure 6 we plotted the average of the estimate (blue curve) over 
50 independent data sets and the ±2 standard error bands of the estimates 
(broken red curves) based on the same 50 independent data sets under the 
true model (M = 4) when a is known. The pointwise standard errors are also 
computed empirically from the converged replicates (black curve) among the 
50 simulation runs. We observe that, although being somewhat conserva- 
tive, (3.6) gives a quite satisfactory estimate of the pointwise standard error 
of g. 

We also compare the performance of the proposed procedure with a two- 
stage approach. Following Chen and Wu (2008b), in the first stage, each 
individual trajectory Xii{-) and its derivative X'^i{-) are estimated by lo- 
cal linear and local quadratic smoothing, respectively. The bandwidths are 
chosen by cross-validation. In the second stage, two different methods for 
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Table 3 

Comparison of estimation accuracy of two-stage estimators (either local quadratic 
smoothing or parametric regression using true model in the second stage) with 
hierarchical Ukelihood estimators (for the selected model, among models with 
M — 2, ... ,6 B-splme basis functions) under the sparse case 

Two-stage estimator 

Method Bandwidths Summary 

in stage 2 in stage I statistics x 6 [—0.5, 0.2] x £ (0.2, 1] x Q, (1, 1.5] 



Local quadratic Optimal Mean(ISE(g)) 3.8 x lO'^ 20.177 7.3 x 10® 

smoothing bandwidths Median(ISE(5)) 4.1 x lO'^ 2.398 1.8 x 10^ 

(SD(ISE(3))) (2.3 x 10**) (330.146) (5.1 x 10^) 

Regression Optimal Mean(ISE(5)) 27.592 28.492 0.063 

(true model) bandwidths Median(ISE(5)) 3.812 2.094 0.004 

(SD(ISE(3))) (423.283) (565.281) (1.249) 



Hierarchical likelihood estimator 
Summary 

Method statistics x 6 [-0.5,0.2] x E (0.2, 1] x E (1, 1.5] 



a known Mean(ISE(g)) 0.006 0.083 0.001 

Median(ISE(5)) 0.003 0.041 0.000 

(SD(ISE(3))) (0.009) (0.106) (0.002) 

a estimated Mean(ISE(5)) 0.710 0.195 0.007 

Median(ISE(5)) 0.025 0.054 0.000 

(SD(ISE(3))) (4.751) (0.789) (0.048) 



estimating g are considered with {X'-i{t)} as response and {Xn^t)} as pre- 
dictor: (i) a least squares regression fit of tlie basis coefficients using the true 
model; (ii) a local quadratic smoothing. A more detailed description of the 
two-stage approach and more simulation studies are given in the supplemen- 
tary material [Paul, Peng and Burman (2011), Section S2]. 

In Table 3 we report the integrated squared errors of the two-stage esti- 
mators as well as those of the hierarchical likelihood estimators (under the 
model selected by CV) for the sparse case. While reporting the risk of the 
estimators, we divide the domain of x into three regions: [—0.5,0.2], (0.2,1] 
and (1, 1.5]. In this simulation, even though the true gradient function g has 
support effectively on [—0.5,1.5], the observed measurements Ynj^s are al- 
most entirely confined in the region (0.2, 1]. Due to the extrapolation effect, 
methods without using the true initial conditions are expected to perform 
(relatively) poorly in the domains where there is no data. Thus, we divide 
the domain into different regions for more informative comparisons across 
methods. We also plot the pointwise mean and median and pointwise 95% 
bands around the mean for the two-stage estimators of g in Figure 5. These 
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results show that the two two-stage estimators are highly biased and vari- 
able. The one using the true model in the second stage has better behavior 
in the regions where there is no data, compared to the fully nonparametric 
estimator. However, the level of bias and variability is much higher than 
the proposed estimator on all three regions. Another important observation 
is that, for the hierarchical likelihood estimator, the median of integrated 
squared errors over the data domain (0.2,1] is comparable for the cases when 
the initial condition a is known and when a is estimated. 

To further compare these two approaches, we conduct another simulation 
study where all Oi's are taken to be zero (equivalently, ag = 0), so that 
there is no subject-specific variability. For this simulation, we also consider 
a sampling design, referred to as "very dense," in which the number of 
measurements per curve is Uniform [60, 100] so that the first stage estimates 
of the two-stage methods are more accurate. The number of subjects is 
chosen to be n = 10 and there is only one curve per subject (i.e., Ni = l). 
The results [reported in Table S5-5 in Paul, Peng and Burman (2011)] show 
that the proposed method again gives better estimates and it is much less 
biased (even when the initial conditions are estimated) . The mean integrated 
squared error over the data domain (0.2, 1] of the hierarchical likelihood 
estimator, when a is estimated, is much smaller than that of the two-stage 
method, even when the true model is used in the second stage. For a more 
detailed comparison of the two approaches, see Section S2 of Paul, Peng and 
Burman (2011). Moreover, we also do simulations when the true gradient 
function g is more complex and does not belong to the model space. The 
overall picture for the performance of the proposed estimation and model 
selection procedures, as well as the comparison with the two-stage methods, 
is consistent with the results presented here. See Section S3 of Paul, Peng 
and Burman (2011) for details. 

Finally, we comment on the computational time and the rate of conver- 
gence of the proposed procedure. These depend on several factors, espe- 
cially the model complexity and bias, noise level and criteria for conver- 
gence. Typically, the convergence is faster when a is treated as known, as 
opposed to when it is estimated from the data. For the simulation study 
presented here, under the true model (M = 4), with a known, convergence 
is generally achieved in about 30 to 40 Levenberg-Marquardt steps and 
often in only 2 to 3 Newton-Raphson steps. The number of Levenberg- 
Marquardt steps required for convergence almost doubles when a is esti- 
mated. For biased models [including those presented in Section S3 of Paul, 
Peng and Burman (2011)], the convergence often takes more steps (up to 
150 Levenberg-Marquardt steps and several Newton-Raphson steps). The 
computational times for the simulation study presented in this section are 
summarized in Table 4. These computations were carried out on a 64-bit 
Linux machine with Intel Core 2 Quad processors running at 3.2 GHz and 
with 8 GB RAM. 
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Table 4 

Computational cost for the simulation study in Section 4- Reported quantities are the 
average time in seconds and standard deviations (within brackets) over 50 replicates 
(including the ones without convergence) 



Model (M) 


2 


3 


4 


5 


6 


moderate a known 


11.40 


20.34 


28.14 


41.51 


42.29 




(0.24) 


(0.68) 


(0.73) 


(1.53) 


(2.39) 


a estimated 


21.22 


89.20 


44.23 


56.34 


69.89 




(1.18) 


(18.38) 


(4.54) 


(9.33) 


(23.05) 


sparse a known 


11.50 


20.35 


28.25 


41.53 


42.57 




(0.33) 


(0.63) 


(0.80) 


(1.58) 


(3.05) 


a estimated 


24.01 


93.58 


47.06 


68.57 


89.57 




(1.59) 


(17.02) 


(11.55) 


(25.08) 


(38.75) 



5. Application: Plant growth data. In this section we apply the proposed 
method to the plant growth data from Sacks, Silk and Burman (1997) de- 
scribed in the earlier sections. One goal of this study is to investigate the 
effect of water stress on growth displacement rate within the meristem of 
the primary root of maize seedlings. Note that, meristem is the tissue in 
plants consisting of undifferentiated cells and found in zones of the plant 
where growth can take place. The growth displacement rate is defined as 
the rate of displacement of a particle placed along the root and it should 
not be confused with "growth rate" which usually refers to the derivative of 
the growth trajectory with respect to time. For more details, see Sacks, Silk 
and Burman (1997). Growth displacement rate is important to infer the cell 
division rate — the local rate of formation of cells — that is not directly ob- 
servable in a changing population of dividing cells. The growth displacement 
rate is also needed for understanding some important physiological processes 
such as biosynthesis [Silk and Erickson (1979); Schurr, Walter and Rascher 
(2006)]. Moreover, a useful growth descriptor called the "relative elemental 
growth rate" (REGR) can be calculated as the gradient of the growth dis- 
placement rate (with respect to distance), which shows quantitatively the 
magnitude of growth at each location within the organ. 

The data consist of measurements on ten plants from a control group and 
nine plants from a treatment group where the plants are under water stress. 
The meristem region of the root, where the measurements are taken, is shown 
in Figure 1 (left panel). The primary roots had grown for approximately 
18 hours in the normal and stressed conditions before the measurements 
were taken. The roots were marked at different places using a water-soluble 
marker and high-resolution photographs were used to measure the displace- 
ments of the marked places. The measurements were in terms of distances 
from the root cap junction (in millimeters) and were taken for each of these 
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Fig. 3. Growth trajectories for plant data. Left panel: a plant in the control group. Right 
panel: a plant in the treatment group. 



marked places, hereafter markers, over an approximate 12-hour period while 
the plants were growing. The measurement process is shown schematically 
in the right panel of Figure 1. In Figure 3 the growth (displacement) trajec- 
tories of one plant with 28 markers in the control group and another plant 
with 26 markers in the treatment group are depicted. Note that measurement 
times are different for these two plants. Also, measurements were only taken 
in the meristem. Thus, whenever a marker grew outside of the meristem, its 
displacement would not be recorded at later times anymore. This, together 
with possible technical failures (in taking measurements), is the reason that 
in Figure 3 some growth trajectories were cut short. More sophisticated data 
acquisition techniques are described in Walter et al. (2002) and Basu et al. 
(1998), where the proposed method is also potentially applicable. 

Many studies in plant science such as Silk (1994), Sacks, Silk and Burman 
(1997) and Fraser, Silk and Rost (1990) all suggest reasonably steady growth 
velocity across the meristem under both normal and water-stress conditions 
at an early developmental stage. Moreover, exploratory regression analysis 
based on empirical derivatives and empirical fits of the growth trajectories 
indicates that time is not a significant predictor and, thus, an autonomous 
model is reasonable. This also means that time zero does not play a role in 
terms of estimating the dynamical system and there is also no additional 
variation associated with individual markers. In addition, the form of the 
gradient function g is not known to the plant scientists, only its behav- 
ior at root cap junction and at some later stage of growth are known [Silk 
(1994)]. Figure 2, the scatter plot of empirical derivatives versus empirical 
fits in the treatment group, indicates that there is an increase in the growth 
displacement rate starting from a zero rate at the root cap junction, fol- 
lowed by a nearly constant rate beyond a certain location. This means that 
growth stops beyond this point and the observed displacements are due to 
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Fig. 4. True and fitted gradient function g by hierarchical likelihood approach for the 
sparse case. The true model (with M = 4 B-spline basis functions with equally spaced 
knots) is used in fitting. Top panel: initial conditions 
conditions a are estimated. 



a are known. Bottom panel: initial 



growth in the part of the meristem closer to the root cap junction. Where 
and how growth stops is of considerable scientific interest. These boundary 
behaviors also imply that a linear ODE model is obviously not appropri- 
ate. In addition, popular parametric models such as the Michaelis-Menten 
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Fig. 5. True and fitted gradient function g by two-stage approach for the sparse case. 
Top panel: the second stage uses local quadratic smoothing. Bottom panel: the second 
stage uses regression under the true model. 



type either do not satisfy the boundary constraints and/or have parame- 
ters without clear interpretations in the current context. Moreover, there 
is some controversy among plant scientists about the possible existence of 
a "growth bump" in the middle of the meristem. Taking all these features 
into consideration, the semiparametric model proposed in this paper is ap- 
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Fig. 6. Standard error estimates for the simulation study in Section 4- Pointwise stan- 
dard error of 'g for the sparse case with initial conditions a known. The true model (with 
M = 4 B-spline basis functions) is used in fitting. Solid black curve: pointwise standard 
error computed from 50 replicates. Solid blue curve: averaged pointwise standard error es- 
timates from (3.6) (based on 50 replicates). Broken red curve: 2 standard deviations bands 
for the estimated pointwise standard error (based on 50 replicates) . 

propriate for investigating the scientific questions associated with this study, 
in particular, comparing the basehne growth displacement rates between the 
treatment and control groups. Notice that, in order for the proposed estima- 
tion method to give an accurate estimate of the gradient function, we need 
only that the measurement on the state variable x is dense in its domain, 
and that the measurement errors are independent across time. These are 
satisfied for the plant data since, even though each trajectory is recorded at 
a relatively small number of time points, there is a fairly large number of 
trajectories for each plant, corresponding to the different initial conditions. 
Note that, for each plant, the number of measurements is indeed the sum 
total of all the measurements for its different trajectories. Moreover, the 
proposed method combines information across different plants (subjects), 
which allows one to fit the model reasonably well even with relatively few 
measurements per subject. 

Now consider the model described in Section 2. For the control group, we 
have the number of curves per subject Ni varying in between 10 and 29; and 
for the water stress group, we have 12 < A^j < 31. The observed growth dis- 
placement measurements {Ynj : j = 1, . . . , rrinj = 1, . . . , Ni}^^-^ are assumed 
to follow model (3.1), where mu is the number of measurements taken 
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for the ith. plant at its Ith. marker, which varies between 2 and 17; and 
{tiij = 1) • ■ • j'^'ti} are the times of measurements, which are in between 
[0, 12] hours. Altogether, for the control group there are 228 curves with 
a total of 1,486 measurements and for the treatment group there are 217 
curves with 1,712 measurements in total. Note that the constraint at the 
root cap junction corresponds to g{0) = = g'{0), which is imposed by sim- 
ply omitting the constant and linear terms in the spline basis. The flatness 
of g at a (unknown) distance away from the root cap junction means that 
g'{x) = for x> A for some constant ^4 > 0. In order to impose this, as part 
of the objective function (3.3), we use 



f-2A r I-2A 

P'^B(3:=Xr {g'{x)fdx = \R0^ 0'(x)(0'(x))^dx 
J A VJa 



/3, 



where = {4>i,m, ■ ■ ■ ,4'm,m)'^ and Xr is a large positive number quantify- 
ing the severity of this constraint; and ^4 > determines where the growth 
displacement rate becomes a constant. A and Xr are both adaptively de- 
termined by the model selection scheme discussed in Section 3.3. Moreover, 
since the initial conditions (marker positions) {an} are chosen according to 
some fixed experimental design (though measured with errors), it is not ap- 
propriate to shrink their estimates toward a fixed number. Hence, we set 
Ai = in the loss function (3.3). 

Before fitting the proposed model, we first describe a simple regression- 
based method for getting a crude initial estimate of the function g{-), as 
well as selecting a candidate set of knots. This involves (i) computing the 

_^0) ^ 

re-scaled empirical derivatives e » X-^j of the sample curves from the data, 
where the empirical derivatives are defined by taking divided differences: 

^'ilj ■= O^iiU+i) ~ - kij), and 6'> is a preliminary estimate 

of 6i ; and (ii) regressing the re-scaled empirical derivatives onto a set of basis 
functions evaluated at the corresponding sample averages: Xuj := (yj^(j_(_i) + 
Yiij)/2. In this paper we use the basis {x^,x^,{x — Xk)%}f^i with a pre- 
specified, dense set of knots {xk}^^i- Then, a model selection procedure, 
like the stepwise regression, with either AIC or BIC criterion, can be used to 
select a set of candidate knots. In the following, we shall refer to this method 
as stepwise-regression. A similar method is employed by Sacks, Silk and 
Burman (1997). The resulting estimate of g and the selected knots can then 
act as a starting point for the proposed procedure. We expect this simple 
method to work reasonably well only when the number of measurements per 
curve is moderately large. Comparisons given later (Figure 10) demonstrate 
a clear superiority of the proposed method over this simple approach. 

We fit the proposed model to the control group and the treatment group 
separately. For the control group, we first use the procedure described in 
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Section 3.1 with g represented in cubic i?-splines with M (varying from 
2 to 12) equahy spaced knots. At this stage, we set = 1m, = On, 
a™' = {Xii{tiii):l = 1, . . . , Ni)f^i. The criterion based on the approximate 
CV score [equation (A. 7) in the Appendix] selects the model with M = 9 
basis functions. This is not surprising since, when equally spaced knots are 
used, usually a large number of basis functions are needed to fit the data 
adequately. In order to get a more parsimonious model, we consider the 
stepwise-regression method to obtain a candidate set of knots. We use 
28 equally spaced candidate knots on the interval [0.5, 14] and use the fitted 
values {6f^''}j^^ from the previous i?-spline fit. The AIC criterion selects 
a model with 10 knots among these 28 candidate knots, plus the quadratic 
term. We then consider various submodels with knots chosen from this set of 
selected knots and fit them again using the proposed estimation procedure. 
The approximate CV scores for a number of different submodels are re- 
ported in Table 5. The parameters A and are also varied and selected by 
the approximate CV score. Based on the approximate CV score, the model 
with knot sequence (3.0,4.0,6.0,9.0,9.5) and {A,Xr) = (9,10^) is selected. 
Also note that the model selected by stepwise-regression has a larger 
CV score than those of the models reported in Table 5. A similar procedure 
is applied to the treatment group. It turns out that the model with knot 
sequence (3.0,3.5,7.5), which is also selected by stepwise-regression, has 

Table 5 

Model selection for real data. Control group: approximate CV scores for four submodels 
of the model selected by the AIC criterion in the stepwise-regression step. 
Ml: fcnots= (3.0,4.0,5.0,6.0,9.0,9.5); M2: fcnots = (3.0, 4.0, 5.5, 6.0, 9.0, 9.5); 
M3: knots ^ (3.0,4.0,6.0,9.0,9.5); Kj.: knots ^ (3.0,4.5,6.0,9.0,9.5). Treatment group: 
approximate CV scores for the model M: knots — (3.0, 3.5, 7.5) 



Control 



Model 




\r = 10^ 






Ah = 10^ 




A = 8.5 


A = 9 


A = 9.5 


A = 8.5 


A = 9 


A = 9.5 


Ml 


53.0924 


53.0877 


53.1299 


54.6422 


53.0803 


53.1307 


M2 


53.0942 


53.0898 


53.1374 


54.5190 


53.0835 


53.1375 


M3 


53.0300 


53.0355 


53.0729 


53.8769 


53.0063 


53.0729 


M4 


53.0420 


53.0409 


53.0723 


54.0538 


53.0198 


53.0722 


Treatment 


Model 




\n = 10^ 






Xr = 10^ 




A = 7 


A = 7.5 


A = 8 


A = 7 


A = 7.5 


A = 8 


M 


64.9707 


64.9835 


64.9843 


65.5798* 


64.9817 


64.9817 



*No convergence. 
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Fig. 7. Fitted gradient function p, and pomtwise 2 standard error bands under the se- 
lected models for control and treatment groups. 

considerably smaller CV score compared to all other candidate models, and, 
hence, we only report the CV scores under this model in Table 5 with var- 
ious choices of {A,Xji). It shows that {A,Xji) = (7,10^) has the smallest 
approximate CV score. 

Figure 7 shows the estimated gradient functions g under the selected 
models for the control and treatment groups, respectively. Apart from g, we 
also plot the estimated pointwise two-standard error bands using (3.6). The 
fact that the bands are generally nonoverlapping except for a small region 
clearly indicates that the baseline growth displacement rates for the control 
and treatment groups are different. The plot also shows that there is no 
growth bump for either group. In the part of the meristem closer to the 
root cap junction (distance within ~5.5 mm), the growth displacement rate 
for the treatment group is higher than that for the control group. This is 
probably due to the greater cell elongation rate under water stress condition 
in this part of the meristem so that the root can reach deeper in the soil to 
get enough water. This is a known phenomenon in plant science. The growth 
displacement rate for the treatment group flattens out beyond a distance of 
about 6 mm from the root cap junction. The same phenomenon happens 
for the control group, however, at a further distance of about 8 mm from 
the root cap junction. Also, the final constant growth displacement rate of 
the control group is higher than that of the treatment group. This is due 
to the stunting effect of water stress on these plants, which results in an 
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CO 
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distance from root cap junction (in mm) 



Fig. 8. Fitted relative elemental growth rate (REGR) under the selected models for con- 
trol and treatment groups, respectively. The REGR is computed by differentiating the es- 
timated gradient function g. 

earlier stop of growth and a slower cell division rate. Figure 8 shows the 
estimated relative elemental growth rates (i.e., g') for these two groups. 
Relative elemental growth rate (REGR) relates the magnitude of growth 
directly to the location along the meristem. For both groups, the growth is 
fastest in the middle part of the meristem (~3.8 mm for control group and 
~3.1 for treatment group), and then growth dies down pretty sharply and 
eventually stops. We observe a faster growth in the part of the meristem 
closer to the root cap junction for the water stress group and the growth 
dies down more quickly compared to the control group. The shape of the 
estimated g may suggest that it might be modeled by a logistic function with 
suitably chosen location and scale parameters, even though the scientific 
meaning of these parameters is unclear and the boundary constraints are 
not satisfied exactly. As discussed earlier, there is insufficient knowledge 
from plant science to suggest a functional form beforehand. This signifies 
the major purpose and advantage of nonparametric modeling, which is to 
provide insights and to suggest candidate parametric models for further 
studies. 

In order to check how our method performs in terms of estimating in- 
dividual sample trajectories, we solved the differential equation model for 
each plant i with fitted values of Xj/(0), 9i and g. Figure 9 shows the fit- 
ted (under the selected model) and observed trajectories for three plants 
each from the control and the treatment groups. As can be seen from this 
figure, although there are subject-specific variabilities in the fits, the over- 
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treatment group : plant 5 
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Fig. 9. Observed (black) and fitted (red) trajectories (under the selected models) for the 
plant data. Every third trajectory of each plant is plotted. Top panel: (from left) plant # 
6, 9, 10 in the control group. Bottom panel: (from left) plant # 3, 5, 7 in the treatment 
(water stress) group. 



all shapes of the trajectories are captured fairly well. Figure 10 shows the 
residual versus time plot for the treatment group. The plot for the con- 
trol group is similar and thus is omitted. This plot shows that the pro- 
posed procedure based on minimizing the objective function (3.3) has much 
smaller and more evenly spread residuals (SSE = 64.50) than the fit by 
stepwise-regression (SSE = 147.57), indicating a clear benefit of the 



initial estimate Final estimate 




2 4 6 8 11 12 2 4 6 8 10 12 

time (In hrs) time (In hrs) 



Fig. 10. Residual versus time plots for the treatment group. Left panel: fit by 
stepwise-regression. Right panel: fit by the proposed method based on maximizing the 
hierarchical likelihood. 
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more sophisticated approach. Overah, the estimation and model selection 
procedures give reasonable fits under both experimental conditions. Note 
that, for the first six hours, the residuals (right panel of Figure 10) show 
some time-dependent pattern, which is not present for later times. Since 
throughout the whole 12 hour period the residuals remain small compared 
to the scale of the measurements, the autonomous system approximation 
seems to be adequate at least for practical purposes. Nevertheless, modeling 
growth dynamics through nonautonomous systems may enable scientists to 
determine the stages of growth that are not steady across a region of the 
root. This aspect is discussed briefly in Section 6. 

6. Discussion. The model and the fitting procedures presented in this 
paper are quite flexible and effective in terms of modeling autonomous dy- 
namical systems nonparametrically when the data are from a number of 
subjects and when the underlying population level dynamics is of interest. 
When applying the proposed method to the plant growth data, we obtain 
results that are scientifically sensible. For the plant data, g is nonnegative 
and, thus, a modeling scheme imposing this constraint may be more advan- 
tageous. However, the markers are all placed at a certain distance from the 
root cap junction, where the growth displacement rate is already positive, 
and the total number of measurements per plant is moderately large. These 
mean that explicitly imposing nonnegativity is not crucial for the plant data, 
a fact also supported by the estimates which turn out to be nonnegative and 
the simulation results where the resulting estimators of g are always non- 
negative for the moderate and/or "a known" cases. In general, if g is strictly 
positive (strictly negative) over the domain of interest, then we can model 
the logarithm of g (resp., —g) by basis representation. 

The proposed approach is flexible in terms of incorporating various con- 
straints on the dynamics and is able to capture features of the dynamical 
system which are not known to us a priori. It can also be extended to incor- 
porate covariate effects, as well as to model nonautonomous systems which 
are currently under investigation. Even though in this paper we use the plant 
growth data as an illustration, the proposed framework is potentially useful 
to many other studies with similar types of data, where estimating the un- 
derlying dynamical system is of interest. For example, the data set collected 
as part of the Multicenter AIDS Cohort Study [Kaslow et al. (1987); Diggle 
et al. (2002)] can be used to study the dynamics of the CD4-I- counts. In- 
vestigating the dynamics of CD4-I- counts at a population level, while also 
taking into account individual effects, is of great importance to understand 
the progression of AIDS. This data set consists of 2,376 measurements of 
CD4-|- cell counts against time since seroconversion (time when HIV be- 
comes detectable) for 369 infected men enrolled in the study. In this data 
set, each patient is a subject and there is one sample curve associated with 
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each subject which reflects CD4+ counts over time. Moreover, each curve 
is only observed at a few time points and the set of measurement times is 
different across patients. The estimation procedure proposed in this paper 
can be adjusted appropriately to deal with such scenarios more effectively. 
Specifically, in order to deal with a large number of random effects, instead 
of the hierarchical likelihood approach, we can adopt a marginal maximum 
likelihood approach. These are topics of our ongoing research. 

APPENDIX 

Gradient of the sample trajectories. Note that Xii{-) satisfies 

(A.l) Xa{t) = aii+ e'^^y^Mk,Mi^^lis))ds, t€[0,l]. 

k=i 

Differentiating (A.l) with respect to the parameters, we have 
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dXiijs) 



M 



k=l 



ds 



for i = 1, . . . , n; I = 1, . . . , Ni; r = 1, . . . , M. In other words, these functions 
satisfy the linear differential equations: 
, M 



k=l 

M 



M 



-X^lit) = X^l{t)e''Y.M'kMXait)) + e'^Y.MkM{Xu{t)), 



k=l 



M 



k=l 



-X^[{t) = xi^fit)e'^^/3,cPl^iXuit)) + e'^AM^uit)), 4^(0) = 0. 



k=l 



If the au^s are positive and the function (7^ := "^^=1 h<t^k,M is positive on the 
domain of aj/'s, then the trajectories Xnit) are nondecreasing in t. In this 
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case, and more generally, whenever the solutions exist on the time interval 
[0, 1] and is continuously differentiable the gradients of the trajectories 
can be solved explicitly: 

9l3[Xii{0)) 

(A.3) Xfl{t) = e''tgp{Xa{t)), 

(A.4) X^fit) = g,{Xa{t)) ^"^'^ ^f^^ dx. 



IXa(0) {9l3{x)f 

We verify (A.4). Proofs (A.2) and (A.3) are similar. We can express 



Xf^(t)=e^ 



( (f)r,M{Xii{s))exp(^e^' g'i^{Xii{u)) dvi^ ds 



[using X[i{u)=e'^^gp{Xu{u))] 

t 

4>r,M [Xii (s) ) exp(log gf3 {Xii (t)) - log g/s {Xu (s))) ds 



{g0{Xa{s))f 
= gf3{Xii{t)) / - — i-^dx. 

Derivation of CV. Observe that, when evaluated at the estimate a, 
and P based on the full data, 

_d_ 

(A.5) 
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+2A2^. = 0, i = l,...,n- 
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Whereas, when evaluated at the drop (i, /)-estimates, a^-^ *'\/3^ \ 
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Expanding the left-hand side of (A. 6) around /3 and 6, and using (A. 5), we 
obtain the following first order approximations: 

-1 



i ^ i 



+ 



f)2 pcv 



+ 2X2 



EE 

i'=ij'=i 

- „ N-, m-ij, r.2pcv 



E 



+ 2B 



E 



In the above, the gradients and Hessians of I'll - ^ evaluated at (a, 
and, thus, they have already been computed on a fine grid in the course of ob- 
taining these estimates. Hence, there is almost no additional computational 
cost to obtain these approximations. Now for i = 1, . . . , n; / = 1, . . . , A'j, define 

rriii 



A-ii) 



: arg mm 

a 



where a is the estimator of a obtained from the full data. Finally, the ap- 
proximate leave-one-curve-out cross-validation score is 



(A.7) 
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SUPPLEMENTARY MATERIAL 

Supplement to "Semiparametric modeling of autonomous nonlinear dy- 
namical systems with application to plant growth" 

(DOI: 10.1214/11-AOAS459SUPP; .pdf). The supplementary materials pro- 
vide additional details on the computational schemes. It also contains further 
simulation studies elucidating the performance of the proposed estimators 
under scenarios not covered in the main article. 
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